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Abstract 

We introduce an approximate search algorithm for fast maxi¬ 
mum a posteriori probability estimation in probabilistic pro¬ 
grams, which we call Bayesian ascent Monte Carlo (BaMC). 
Probabilistic programs represent probabilistic models with 
varying number of mutually dependent finite, countable, and 
continuous random variables. BaMC is an anytime MAP 
search algorithm applicable to any combination of random 
variables and dependencies. We compare BaMC to other 
MAP estimation algorithms and show that BaMC is faster 
and more robust on a range of probabilistic models. 


Introduction 


Many Artificial Intelligence problems, such as approxi¬ 
mate planning in MDP and POMDP, probabilistic abductive 
reaso ning ([Ra ghava n 201 1|, or utili ty-based recommenda¬ 
tion ( |Shani and Gunawardana 2009) , can be formulated as 
MAP estimation problems. The framework of probabilistic 
inference (Pearl 19881 proposes solutions to a wide range 
of Artificial Intelligence problems by representing them as 
probabilistic models. Efficient domain-independent algo¬ 
rithms are available for several classes of representations, in 
particular for graphical models ( |Lauritzen 1996| l, where in¬ 
ference can be performed either exactly and approximately. 
However, graphical models typically require that the full 
graph of the model to be represented explicitly, and are not 
powerful enough for problems where the state space is ex¬ 
ponential in the problem size, such as the generative mod¬ 
els common in planning (ISzorenyi, Kedenburg, and Munos 


Probabilistic programs (Goodman et al. 2008[ Wood, 


van de Meent, and Mansinghka 2014 1 can represent arbitrary 
probabilistic models. In addition to expressive power, prob¬ 
abilistic programming separates modeling and inference, al¬ 
lowing the problem to be specified in a simple language 
which does not assume any particular inference technique. 
Recent success in PMCMC methods enables efficient sam¬ 
pling from posterior distributions with few restrictions on 
the structure of the models ([Wood7van de Meent, and Mans-| 
[inghka 2014[ Paige et al. 20\^ . 

However, an efficient sampling scheme for finding a MAP 
estimate would be different from the scheme for inferring 
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the posterior distribution: only a single instantiation of 
model’s variables, rather than their joint distribution, must 
be found. This difference reminds of the difference between 
simple and cumulative reward optimization in many settings, 
for example, in Multi-armed bandits ( |Stoltz, Bubeck, and] 
Munos 201 l|l: when all samples contribute to the total re¬ 


ward, the algorithms are said to optimize the cumulative 
reward, which is the classical Multi-armed bandit settings. 
Alternatively, when only the quality of the final choice mat¬ 
ters, the algorithms are said to optimize the simple reward. 
This setting is often called a search problem. Previous re¬ 
search demonstrated that different sampling schemes work 
better for either cumulative or simple reward, and algorithms 
which are optimal in one setting can be suboptimal in the 
other ( Hay et al. 2012| l. 

In this paper, we introduce a sampling-based search al¬ 
gorithm for fast MAP estimation in probabilistic programs, 
Bayesian ascent Monte Carlo (BaMC), which can be used 
with any combination of finite, countable and continuous 
random variables and any dependency structure. We em¬ 
pirically compare BaMC to other feasible MAP estimation 
algorithms, showing that BaMC is faster and more robust. 


Preliminaries 

Probabilistic Programming 

Probabilistic programs are regular programs extended by 
two constructs ( [Gordon et al. 2014) l: a) the ability to draw 
random values from probability distributions, and b) the 
ability to condition values computed in the programs on 
probability distributions. A probabilistic program implic¬ 
itly defines a probability distribution over program state. 
Formally, a probabilistic program is a stateful deterministic 
computation V with the following properties: 

• Initially, V expects no arguments. 

• On every invocation, V returns either a distribution F, a 
distribution and a value (G, y), a value z, or _L. 

• Upon returning F, V expects a value x drawn from F as 
the argument to continue. 

• Upon returning (G, y) or z, V is invoked again without 
arguments. 

• Upon returning _L, V terminates. 




































A program is run by calling V repeatedly until termina¬ 
tion. Every run of the program implicitly produces a se¬ 
quence of pairs [Fi^xi) of distributions and values drawn 
from them. We call this sequence a trace and denote it by x. 
Program output is deterministic given the trace. 

By definition, the probability of a trace is proportional to 
the product of the probability of all random choices x and 
the likelihood of all observations y : 

|2:| l»l 
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The objective of inference in probabilistic program V is to 
discover the distribution of program output.^ 

Several implementations of general probabilistic pro- 
gramming languages are available ([Goodman et al. 2008[ 
Wood, van de Meent, and Mansinghka 2014] [ Inference 

is usually performed using Monte Carlo sampling algo¬ 


rithms for probabilistic programs (Wingate, Stuhlmiiller, 


and Goodman 2011 Wood, van de Meent, and Mansinghka 


2014[ Paige et al. 2()14[ ). While some algorithms are better 
suited for certain problem types, most can be used with any 
valid probabilistic program. 


Maximum a Posteriori Probability Inference 

Maximum a posteriori probability (MAP) inference is the 
problem of finding an assignment to the variables of a prob¬ 
abilistic model that maximizes their joint posterior probabil¬ 
ity ( [Murphy 2012[ ). Sometimes, a more general problem of 
marginal MAP inference estimation is solved, when the dis 
tribution is marginalized over some of the variables (Doucet, 


Godsill, and Robert 2002| Maua and de Campos 2012| ). In 
this paper we consider the simpler setting of MAP estima¬ 
tion, where assignment for all variables is sought, however 
the proposed algorithms can be extended to marginal MAP 
inference. 

For certain graphical models the MAP assignment can be 
found exactly ([Park and Darwiche 2003 1 Sun, Druzdzel, and 


[Yuan 2007) 1. However, in most advanced cases, e.g. mod 
els expressed by probabilistic programs, MAP inference is 
intractable, and approximate algorithms such as Stochastic 
Expectation-Ma ximization (jWei and Tanner 1990| ) or Simu¬ 
lated Annealing ( jAndrieu and Doucet 2000 1 are used. 

Simulated Annealing (SA) for MAP inference constitutes 
a universal approach which is based on Monte Carlo sam¬ 
pling. Simulated Annealing is a non-homogeneous ver¬ 
sion of Metropolis-Hastings algorithm where the acceptance 
probability is gradually changed in analogy with the phys¬ 
ical process of annealing (Kirkpatrick, Gelatt, and Vecchi| 


19831. Convergence of Simulated Annealing algorithms de¬ 


pends on the properties of the annealing schedule — the rate 
with which the acceptance probability changes in the course 
of the algorithm ( Lundy and Mees 1986| l. When the rate is 
too low, the SA algorithm may take too many iterations to 
find the global maximum. When the rate is too high, the 


'Note that this conceptualization of a probabilistic program cor¬ 
responds, for example, to the approach in [Goodman and Stuhlm-| 
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algorithm may fail to find the global maximum at all and 
get stuck in a local maximum instead. Tuning the anneal¬ 
ing schedule is necessary to achieve reasonable performance 
with SA, and the best schedule depends on both the problem 
domain and model parameters. 


Bayesian Ascent Monte Carlo 

We introduce here an approximate search algorithm for fast 
MAP estimation in probabilistic programs, Bayesian ascent 
Monte Carlo (BaMC). The algorithm draws inspiration from 
Monte Carlo Tree Search ( jKocsis and Szepesvari 2006 1. Un¬ 
like Simulated Annealing, BaMC uses the information about 
the probability of every sample to propose assignments in 
future samples, a kind of adaptive proposal in Monte Carlo 
inference. BaMC differs from known realizations of MCTS 
in a number of ways. 


• The first difference between BaMC and MCTS as com¬ 
monly implemented in online planning or game playing 
follows from the nature of inference in probabilistic mod¬ 
els. In online planning and games, the search is performed 
with the root of the search corresponding to the current 
state of the agent. After a certain number of iterations, 
MCTS commits to an action, and restarts the search for 
the action to take in the next state. In probabilistic pro¬ 
gram inference assignment to all variables must be deter¬ 
mined simultaneously, hence the sampling is always per¬ 
formed for all variables in the model. 


• Additionally, probabilistic programs often involve a com¬ 
bination of finite, infinite countable, and infinite continu¬ 
ous random variables. Variants of MCTS for continuous 
variables were developed ( Couetoux et al. 2011 | l, however 
mixing variables of different types in the same search is 
still an open problem. BaMC uses open randomized prob¬ 
ability matching, also introduced here, to handle all vari¬ 
ables in a unified way independently of variable type. 


• Finally, BaMC is an any-time algorithm. Since BaMC 
searches for an estimate of the maximum of the posterior 
probability, every sample with a greater posterior proba¬ 
bility than that of all previous samples is an improved esti¬ 
mate. BaMC outputs all such samples. As sampling goes 
on, the quality of solution improves, however any cur¬ 
rently available solution is a MAP estimate of the model 
with increasing quality. 


BaMC (Algorithm [^1 maintains beliefs about probability 
distribution of log weight (the logarithm of unnormalized 
probability defined by Equation^ of the trace for each value 
of each random variable in the probabilistic program. At 
every iteration (Algorithm [T]l the algorithm runs the proba¬ 
bilistic program (lines |4]-|18^ and computes the log weight 
of the trace. If the log weight of the trace is greater than 
the previous maximum log weight, the maximum log weight 
is updated, and the trace is output as a new MAP esti¬ 
mate (lines 19 2T|. Finally, the beliefs are updated from 
the log weight of the sample (lines |22]j2^ . 

The ability of BaMC to discover new, improved MAP es¬ 
timates depends on the way values are selected for random 
variables (line|7]i. On one hand, new values should be drawn 





















































Algorithm 1 Monte Carlo search for MAP assignment. 

1: max-log-weight ^- oo 

2: loop 

3: trace ^ (), log-weight ^ 0 

4: result ^ VO /* prohuDilistic program */ 

5: loop 

6: if result is Fi then 

7: S ELECTVALUE(i, FO 

8: log-weight ^ log-weight + logp^i (xi) 

9: log-weighti ^ log-weight 

10: PuSH(trace,(Fi, xO) 

11: result ^ 7 ^( 0 :^) 

12: else if result is {Gj, pj) then 

13: log-weight ^ log-weight + logpcj (Uj) 

14: result ^ VQ 

15: else if result is Zk then 

16: OUTPUT(2;fc) 

17: result ^ VQ 

18: else break 

19: if log-weight > max-log-weight then 

20: OUTPUT(trace) 

21: max-log-weight ^ log-weight 

22: for i in | trace | downto 1 do 

23: UPDATE(i, log-weight - log-weighty) 


to explore the domain of the random variable. On the other 
hand, values which were tried previously and resulted in a 
high-probability trace should be re-selected sufficiently of¬ 
ten to discover high-probability assignments conditioned on 
these values. 


Open Randomized Probability Matching 

Randomized probability matching (RPM), also called 
Thompson sampling ( Thompson 1933| l, is used in many con¬ 
texts where choices are made based on empirically deter¬ 
mined choice rewards. It is a selection scheme that main¬ 
tains beliefs about reward distributions of every choice, se¬ 
lects a choice with the probability that the average reward 
of the choice is the highest one, and revises beliefs based on 
observed rewards. Bayesian belief revision is usually used 
with randomized probability matching. Selection can be im¬ 
plemented efficiently by drawing a single sample from the 
belief distribution of average belief for every choice, and se¬ 
lecting the choice with the highest sample value ( |Scott 2010 
Agrawal and Goyal 2012 1 . 


Here we extend randomized probability matching to do¬ 
mains of infinite or unknown size. We call this generalized 
version open randomized probability matching (ORPM) 
(Algorithm]^. ORPM is given a choice distribution, and 
selects choices from the distribution to maximize the to¬ 
tal reward. ORPM does not know or assume the type of 
the choice distribution, but rather handles all distribution 
types in a unified way. Like RPM, ORPM maintains be¬ 
liefs about the rewards of every choice. First, ORPM uses 
RPM to guess the reward distribution of a randomly drawn 
choice (lines [6]jT^. Then, ORPM uses RPM again to select 
a choice based on the beliefs of each choices, including a 
randomly drawn choice (lines [4f|2^. If the selected choice 


Algorithm 2 Open randomized probability matching. 

1: choices () 

2 : loop 

3: !'■ : , • SelectValuei;. F : 'I 

4: if choices = () then best-choice random-choice 

5: else 

6: best-reward ^- 00 

7: best-belief ^ _L 

8: for choice in choices do 

9: reward ^ DRAW(RewardBelief(choice)) 

10: if reward > best-reward then 

11: best-reward^reward 

12: best-belief^MeanRewardBelief(choice) 

13: best-reward ^ DRAW(best-belief) 

14: best-choice ^ random-choice 

15: for choice in choices do 

16: rewards— D RAW (MeanRewardBelief(choice)) 

17: if reward > best-reward then 

18: best-reward ^ reward 

19: best-choice ^ choice 

20: if best-choice = random-choice then 

21: best-choice ^ DRAW(ChoiceDistribution) 

22: RewardBelief(best-choice) ^ PriorBelief 

23: choices ^ APPEND(choices,best-choice) 

24: /* result - V{) */ 

25: reward ■<— EXECUTE(best-choice) 

26: F : IPDATE(i, log-'.voight - log-w eight ■ 'V 

27: UPDATEREWARDBELlEF(best-choice,reward) 


is a randomly drawn choice, the choice is drawn from the 
choice distribution (line [2T|l and added to the set of choices. 
Finally, an action is executed based on the choice (line [25] ) 
and the reward belief of the choice is updated based on the 
reward updated from the action. 

The final form of Bayesian Ascent Monte Carlo is ob¬ 
tained by combining Monte Carlo search for MAP assign¬ 
ment (Algorithm H and open randomized probability match¬ 
ing (Algorithmic I. Selecting a value in line[7|corresponds to 
lines |4|p3] of Algorithm [2j Line [27] of Algorithm 2[ is per¬ 
formed at every iteration of the loop in lines [22}|23 of Algo¬ 
rithm [T] The reliance on randomized probability matching 
allows to implement BaMC efficiently without any tunable 
parameters. 


Belief Maintenance 

In probabilistic models with continuous random choices 
\ogP'p{x\y) (Equation [Tj) may be both positive and nega¬ 
tive, and is in general unbounded on either side, therefore we 
opted for the normal distribution to represent beliefs about 
choice rewards. Since parameters of reward distributions 
vary in wide bounds, and reasonable initial estimates are 
hard to guess, we used an uninformative prior belief, which 
is in practice equivalent to maintaining sample mean and 
variance estimates for each choice, and using the estimates 
as the parameters of the belief distribution. Let us denote by 
Tij a random variable corresponding to the reward attributed 
to random choice j at selection point i. Then the reward 






















Figure 1: HMM with 16 observed states and unknown tran¬ 
sition probabilities. 



Figure 2: Probabilistic Deterministic Inbnite Automata on 
the first chapter of Alice’s Adventures in Wonderland. 


belief distribution is 


~ Bel{rij) = A/'(E(rij), Var(ry)), (2) 

where E(ry ) and Var(ri_;) are the sample mean and vari¬ 
ance of the reward, correspondingly. 

In the same uninformative prior setting, the mean reward 
belief used by randomized probability matching, is 




i?d(r“)4Af(E(r,,), 


Var(ri 


(3) 


where n is the sample size of (Gelman et al. 20031. 

These beliefs can be computed efficiently, and provide 
sufficient information to guide MAP search. Informative 
priors on reward distributions can be imposed to improve 
convergence when available, such as in the approach de¬ 
scribed in ( |Bai, Wu, and Chen 20I3| ). 


Empirical Evaluation 

We present here empirical results for MAP estimation on 
two problems. Hidden Markov Model with 16 observable 
states and unknown transition probabilities (Figures [T| 
and Probabilistic Infinite Deterministic Automata ( |Pfau,| 
Bartlett, and Wood 20I0|l, applied to the first chapter of 


“Alice’s Adventures in Wonderland” as the training data. 
Both represent, for purposes of MAP estimation, probabilis¬ 
tic models of reasonable size, with a mix of discrete and 
continuous random variables. 

For both problems, we compared BaMC, Simulated An¬ 
nealing with exponential schedule and the schedule used 
in ( Lundy and Mees 1986| l which we customarily call 
Lundy-Mees schedule, as well a lightweight implementation 
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Figure 3: A single mn of BaMC on HMM. 

of Metropolis-Hastings (IWingate, Stuhlmiiller, and Good- 
|man 20 iT) adapted for MAP search, as a base line for the 
comparison. In Figures [^j^the solid lines correspond to the 
medians, and the dashed lines to the 25% and 75% quantiles 
of MAP estimates produced by each of the algorithms over 
50 runs for 4000 iterations. For each annealing schedule of 
SA we kept only lines corresponding to the empirically best 
annealing rate (choosing the rate out of the list 0.8, 0.85, 
0.9, 0.95). In both case studies, BaMC consistently out¬ 
performed other algorithms, finding high probability MAP 
estimates faster and with less variability between runs. 

In addition. Figure [^visualizes a single run of BaMC on 
HMM. The solid black line shows the produced MAP esti¬ 
mates, the light-blue lines are weights of individual samples, 
and the bright blue line is the smoothed median of individ¬ 
ual sample weights. One can see that the smoothed median 
approaches the MAP estimate as the sampling goes on, re¬ 
flecting the fact the BaMC samples gradually converge to a 
small set of high probability assignments. 


Discussion 

In this paper, we introduced BaMC, a search algorithm for 
fast MAP estimation. The algorithm is based on MCTS but 
differs in a number of important ways. In particular, the 
algorithm can search for MAP in models with any combina¬ 
tion of variable types, and does not have any parameters that 
has to be tuned for a particular problem domain or configu¬ 
ration. As a part of BaMC we introduced open randomized 
probability matching, an extension of randomized probabil¬ 
ity matching to arbitrary variable types. 

BaMC is simple and straightforward to implement both 
for MAP estimation in probabilistic programs, and for 
stochastic optimization in general. Empirical evaluation 
showed that BaMC outperforms Simulated Annealing de¬ 
spite the ability to tune the annealing schedule and rate in 
the latter. BaMC coped well with cases of both finite and 
infinite continuous variables present in the same problem. 

Full analysis of algorithm properties and convergence is 
still a subject of ongoing work. Conditions under which 
BaMC converges, as well as the convergence rate, in par¬ 
ticular in the continuous case, still need to be established, 
and may shed light on the boundaries of applicability of the 
algorithm. On the other hand, techniques used in BaMC, in 
particular open randomized probability matching, span be¬ 
yond MAP estimation and stochastic optimization, and may 
constitute a base for a more powerful search approach in 
continuous and mixed spaces in general. 
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